CUMFREQ

Overview

The CUMFREQ function computes a cumulative frequency histogram for a dataset, which counts the cumulative number of observations across all bins up to and including each bin. This is useful for understanding how data accumulates across a distribution and for constructing cumulative distribution visualizations.

A cumulative frequency histogram differs from a standard histogram in that each bin value represents the total count of all observations from the first bin through the current bin, rather than just the count within that bin. This makes it easy to answer questions like “how many observations fall below a certain threshold?” or to identify percentile boundaries in continuous data.

The function wraps scipy.stats.cumfreq from the SciPy library, a widely-used scientific computing package for Python. The underlying implementation uses SciPy’s histogram function to first bin the data, then computes the cumulative sum of bin counts.

The numbins parameter controls the granularity of the histogram—more bins provide finer resolution but require more data points to be statistically meaningful. By default, if no range limits are specified, the function automatically determines the histogram range based on the data, using a slightly expanded range calculated as:

\text{range} = \left( \min(a) - s, \max(a) + s \right)

where s = \frac{1}{2} \cdot \frac{\max(a) - \min(a)}{\text{numbins} - 1}

This expansion ensures that all data points, including the minimum and maximum values, fall cleanly within the histogram bins. When lowerlimit and upperlimit are provided, the function uses those values as the explicit range boundaries instead.

For more information, see the SciPy statistics documentation and the scipy.stats.cumfreq source code.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=CUMFREQ(data, numbins, lowerlimit, upperlimit)
  • data (list[list], required): Input data to analyze as a 2D list of numeric values.
  • numbins (int, optional, default: 10): Number of bins to use for the histogram.
  • lowerlimit (float, optional, default: null): Lower bound for the histogram range.
  • upperlimit (float, optional, default: null): Upper bound for the histogram range.

Returns (list[list]): 2D list of cumulative frequencies, or error message string.

Examples

Example 1: Default range with 4 bins

Inputs:

data numbins
1 4
2
3
4

Excel formula:

=CUMFREQ({1;2;3;4}, 4)

Expected output:

Result
1
2
3
4

Example 2: Duplicate values with 2 bins

Inputs:

data numbins
1 2
1
2
2

Excel formula:

=CUMFREQ({1;1;2;2}, 2)

Expected output:

Result
2
4

Example 3: Five data points with 5 bins

Inputs:

data numbins
1 5
2
3
4
5

Excel formula:

=CUMFREQ({1;2;3;4;5}, 5)

Expected output:

Result
1
2
3
4
5

Example 4: Custom range with lower and upper limits

Inputs:

data numbins lowerlimit upperlimit
1 2 0 4
2
3

Excel formula:

=CUMFREQ({1;2;3}, 2, 0, 4)

Expected output:

Result
1
3

Python Code

from scipy.stats import cumfreq as scipy_cumfreq

def cumfreq(data, numbins=10, lowerlimit=None, upperlimit=None):
    """
    Compute the cumulative frequency histogram for the input data.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.cumfreq.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        data (list[list]): Input data to analyze as a 2D list of numeric values.
        numbins (int, optional): Number of bins to use for the histogram. Default is 10.
        lowerlimit (float, optional): Lower bound for the histogram range. Default is None.
        upperlimit (float, optional): Upper bound for the histogram range. Default is None.

    Returns:
        list[list]: 2D list of cumulative frequencies, or error message string.
    """
    def to2d(x):
        return [[x]] if not isinstance(x, list) else x

    # Normalize and validate data
    data = to2d(data)
    try:
        if not isinstance(data, list) or not all(isinstance(row, list) for row in data):
            return "Invalid input: data must be a 2D list or scalar."
        flat = []
        for row in data:
            for val in row:
                flat.append(float(val))
    except Exception:
        return "Invalid input: data must be numeric."
    # Require at least two data points
    if len(flat) < 2:
        return "Invalid input: data must contain at least two values."
    # Validate numbins
    try:
        nb = int(numbins)
        if nb <= 0:
            return "Invalid input: numbins must be a positive integer."
    except Exception:
        return "Invalid input: numbins must be an integer."
    # Validate limits
    drl = None
    if lowerlimit is not None or upperlimit is not None:
        if lowerlimit is None or upperlimit is None:
            return "Invalid input: both lowerlimit and upperlimit must be provided."
        try:
            low = float(lowerlimit)
            up = float(upperlimit)
        except Exception:
            return "Invalid input: limits must be numeric."
        drl = (low, up)
    # Compute cumulative frequency
    try:
        if drl:
            res = scipy_cumfreq(flat, numbins=nb, defaultreallimits=drl)
        else:
            res = scipy_cumfreq(flat, numbins=nb)
        counts = res.cumcount.tolist()
    except Exception as e:
        return f"scipy.stats.cumfreq error: {e}"
    # Convert to 2D list
    return [[c] for c in counts]

Online Calculator